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ABSTRACT 

Four discussions of test construction based on item 
response theory (IRT) are presented. The first discussion, "Test 
Design as Model Building in Mathematical Programming" (T. J. J. M. 
Theunissen) , presents test design as a decision process under 
certainty. A natural way of modeling this process leads to 
mathematical programming. General models of test construction are 
discussed, with information about algorithms and heuristics; ideas 
about the analysis and refinement of test constraints are also 
considered. The second paper, "Methods for Simultaneous Test 
Construction" (Ellen Boekkooi-Timminga) , gives an overview of 
simultaneous test construction using zero-one progreumning. The item 
selection process is based on IRT. Some v^bjective functions and 
practical constraints are presented, the construction of parallel 
tests is considered, and two tables are provided. The third paper, 
"Automated Test Construction Using Minimax Programming" (Wim J. van 
der Linden) , proposes the use of the minimax principle in IRT test 
construction and indicates how this results in test information 
functions deviating less systematically from the target function than 
for the usual criterion of minimal test length. An alternative 
approach and some practical constraints are considered. The final 
paper, "A Procedure Tc Assess Target Information Functions" (Henk 
Kelderman) , discusses the concept of an information function and its 
properties. An interpretable function of information is chosen: the 
probability of a wrong order of the ability estimates of two 
subjects. (SLD) 
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Abstract 

Test design Is presented as a decision process under certainty, a 
natural wa^y of modeling this process leads to mathematical program- 
ming. Several models are presented, including information about 
algorithms and heuristics. Furthermore, notions about the analysis 
and refinement of test constraints are briefly presented. 
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TEST DESIGN AS MODEL BUILDING IN MATHEMATICAL PROGRAMMING 



Introduction 

Programming in the sence of this paper simply means planning^ Luce 
and Raiffa (1957) categorize mathematical programming as belonging 
to the area of (individual) decision making under certainty and 
point at the very close relationship between linear programming 
(one of many forms of mathematical programming) and two-person 
zero-sum game theory. In decision making under certainty each of 
the available options, leads invariably to a specific (certain) 
outcome. Given such a set of options we should choose one that 
optimizes some index* The programming problem, as described by Luce 
and Raiffa in a very general way, consists of (1) options, where 
each option implies the choice of n real numbers, (2) feasibility 
condition?, where each condition consists of a (linear) equality or 
inequality constraining the options and (3) an index associated 
with each option which is a function of the n numbers. This, in 
actual fact. Is the 'model' as used in mathematical programming. 
The term therefore does not refer to psychometric theory, but to 
the structure of a decision process. Several types of decision 
processes occur time and again under different disguises in a large 
variety of concrete fields of activity. Closer Inspection often re- 
veals a common abstract structure, which turns such a class of 
problems Into an abstract class, whose members have essentially the 
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same structure. One such class is the class of Packing problems, 
al'so known as Knapsack test design problems* Further below, it will 
be shown that many test design probl'^ms are members of this class. 

Assuming the availability of a pool of I items., calibrated with 
some I.R.T. model, the total number of possible test is 2^, 
generally an extremely large number. 'Finding* a particular test in 
this universe of tests Is usually impossible on practical grounds. 
Summarizing, to each Itembank belongs a detenninistically defined 
test universe; the problem Is how to find the desired test. 
Generally the test universe is reduced in size by practical consid- 
erations as, for example, limited testing time for students 
Implying a maximum number of items to be used, or by common sense 
considerations, e.g. when one does not Include easy items in a test 
for the selection of scholarship students. Furthermore in a 
psychometrically sophisticated environment quality criteria 
considerations may also play their role, e.g. for every knowledge 
Item there should be two insight items. Obviously, any test design 
process is subject to certain constraints. In modeling the declson 
process (about what Items to include in a test or not), these 
constraints are explicitely included. They play the role of 
limiting the search process through the test universe. Another way 
of looking at it is as follows. Imagine an item bank filled with 
three Items. The total number of possible tests is 2^ = 8, being 
one tt!:t of 0 items, three tests of 1 Item, three tests of 2 items 
and one t?st of 3 items. Also imagine, that the three Items are 
represented by three indicator variables x^-, 1 = 1,2,3, with value 
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1 if the item i is selected for the test and value 0 if it is not. 
Let there also be a three-dimensional orthogonal coordinate system, 
where each coordinate represents an x-variable (having only the two 
values 1 and 0). All possible values of these three x-*s together 
form the collection of vertices of a cube. It is easy to see, that 
each vertex of this cube represents a member of the test universe. 
In this sense, the test universe belonging to an item bank can be 
seen as the collection of vertices on the hull of a convex bo<ly, 
which bocly has a dimensionality that is equal to the number of 
items in the bank. In this case designing a test is like travelling 
over the hull and checking each vertex against the constraints. 
Having found a subset of vertices admlssable under the constraints, 
the task is to find that member of the subset giving roost satis- 
faction In some defined sense. 

A test always consists of a certain number of items and 
frequently we have started with one item and have been adding items 
till we were satisfied. This simple thought suggests that for the 
Index in the sense used before, we could use a simple sum-function 
of Indicator variabels x^, which by its nature is linear. It 
remains to formulate conditions constraining us in our search for 
the Vight* sum-function. Before doing so, another simple sum- 
function will be presented first, which has proven to be of wide 
applicability. It will be presented In it's best known and most 
trivial form, as the Knapsack problem (KP). Subsequently, a form of 
sensitivity analysis will be presented which can useful in the 
analysis of test specifications. Next a way to refine test specifi- 
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cations will be briefly recapitulated* Finally, some problems and 
solutions related to the choice of algorithms will be mentioned. 

General Models* 

The KP has many differeent formulations of which the one leading 
most naturally to the general test design problem will be used. 
Suppose a hiker is travelling with a number of objects* each object 
having Its own monetary value. He Is now arriving at a desert which 
he will have to cross and knows that he will have to travel as 
lightly as possible. This means that he will have to leave some of 
the objects behind. He also knows that once across the desert* he 
will have to trade with the natives and needs a cerain minimum 
amount of cash in order to reach his psychometric laboratjry again. 
All objects have a weight not related to their value. In summary, 
his problem is to minimize the weight of the content. Since an ob- 
ject is either present in the knapsack or not, the objects are 
represented by binary indicator variables = 0 or 1. (Fractional 
objects have no value and negative objects don't exist). Let Wj be 
the weights of the objects, v^- their value and V the lower bound on 
total value of selected objects. The hikers problem can then be 
presented in formal notation as 



n 

minimize I w.x. 

i=l ^ ^ 



n 

V 



subject to I v.x. > 
i=l ^ ^ " 
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and x^- « 0 or 1. 



Que to the constraint in (I), this KP is known as a binary or zero- 
one programming problem; if (1) is replaced by an integer-valued 
variable > 0, the problem is known as in integer progamming 
problem. If (1) is formulated as x^ < 0, it is a general linear 
programming problem. 

The KP is formulated above but has been extended to more general 
forms as the multiple KP and the mul tridimensional KP. The multiple 
KP involves the same problem as above but now putting the objects 
into m knapsacks* The multidimensional version involves optimiza- 
tion under more than one constraint. In formal notation 



n m 

minimize I I w.x 
i=l j=l ^ 



n 

subject to I V.X.. > V 
i=l ^ " 



m 

and X X,. < 1 
j=l 



with x^-j = 0 or 1, 



j «= l,2,*..,m; i = 1,1,2,... ,n, for the multiple KP, and 



n 

minimize I w.x. 

1-1 ^ ^ 
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subject to I v.x. > V 
1=1 ^ ^ ^ 

and, for exaniple, 

n 

1=1 ^ ^ 

with = 0 or 1 for the multidimensional KP, where P is the lower 
bound on some second constraint. A further possibility Is the com- 
bination of multiple and multidiraensional KP. 

If the knapsacks in above examples are seen as tests that have 
to be loaded with items, the stage is reached where we have a sim- 
ple index, the weighted sum function. If constraints meaningful in 
a psychometric sense can be formulated, the stage of test design as 
model building in mathematical programming is reached. In many test 
applications, a very important issue concerns accuracy of measure- 
ment. Since frequently we are only interested in one particular 
point on the ability continuum (e.g., a cut-off score) or at most 
In a limited interval of points on the continuum, the notion of 
concentrating on local accuracy of measurement suggests itself 
automatically. So frequently, part of the test specification 
consists of demands as regards local accuracy of estimation for a 
number of e-points, expressed as test information for these points. 
Because of the property that test information is the ^um of item 
Information, one could develop the notion that the constraints 
(test specifications) take the form of linear functions appearing 
In equalities or inequalities. The V and the P n above KP*s ran 
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then be seen as the mininwm desired amount of test Information 
specified at two e-points and and are item informations in 
Item i at these two points. For tne test design problem the w^ can 
be taken to be equal to 1* Writing all this in the more formal 
I*R.T.-base<t notation it can easily be seen, that the test 
specification "find the test of minimum length with a certain 
amount of information at (for example) two specified 0-po1nts" has 
exactly the same structure as a KP-forroulation. It looks as follows 

I 

minimize I x. 

i=l ^ 

I 

subject to I I.(e)x. > 
and U.i^U. 
and X,- = 0 or 1. 

One can Imagine that, in a data base context, each item is accompa- 
nied by a string filled with coded information about various pro- 
perties of the item. By using a pointer, items can be rearranged 
according to which characteristic is under consideration. For exam- 
ple, items can be coded as regards content and reordered such that 
^1 ^100 ^^^^^ ^0 content domain A, xiqi to XgQO domain B, and 
so forth. This enables us, for example, to add constraints specify- 
ing the proportion of items coming from certain domains to the mo- 
del. 
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A rather different general model in automated test design invol- 
ves the use of so-called Matching algorithms. For this model, it is 
useful to regard items and abilites as nodes in the bipartite graph 
6(V»E), wher Vj^ is the subset of items, V2 is the subset of abili- 
ties and E is the set of edges that connect all elements of both V- 
subsets with each other* Associated with each edge is a weight with 
value I^. (e) at Q^, i»e.,the information value of item i at Q^* A 
matching is defined as a set of edges, where no two edges have a 
node in common. One of the questions that could be asked is *'What 
is the maximum weight matching?", i.e. identify the subset in E 
that is not only a matching, but also has the highest possible 
weight-sum associated with its edges. 

Models like this are useful if a number of tests of equal length 
that are non-overlapping in the items and of gradually in':reasing 
difficulties are to be designed* Details about this approach are 
presented elsewhere. It can b<» shown, that the structure of the 
problem is such that, without specifying this as a constraint in 
the model, the solution is always integer valued. If an upper bound 
of 1 to the variables is specified, the solution is automatically a 
zero-one solution. This implies that standard Linear Programming 
algorithms can be used, which are readily available. 

In order to go to the next section it is necessary to anticipate 
as regards the section on algorithms and heuristics. For the moment 
is is sufficient to say that in many practical situations standard 
Linear Programming (LP) algorithms can also be used in the case of 
KP-like formulations of test design. This matter will be picked up 
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again later* 

Analysis of Test Design Constraints 

LP theory entails a powerful theorem and some very useful techni- 
ques* The theorem r^fered to is titte Duality theorem. Regard the 
following LP problem: 

P(l) (primal) maximize {c'x} 

subject to Ax < b 
and X > 0. 

The duality theorem %dys that associated with P(l) there Is an 
equivalent LP problem P(2), formulated as follows 

P(2) (dual ) minimize {b'u} 

subject to A*u > c 
and u > 0. 

(riot ice the symmetry of both formulations, written in matrix nota- 
tion.) Proof of this theorem and results stemming from it can be 
found in the literature (see Papadimitriou and Steiglitz, 1982 )• 
For our purpose It Is sufficient to know that the optimal value of 
the target function is the same in P(l) and P(2). An interesting 
feature Is the appearance in the Dual of new variables u, associa- 
ted with the right-hand sMe constraints b of the Primal. These new 
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variables are called the Shadow Prices (SP) of the Primal (the name 
originates from economic theory). Inspection of shadow prices can 
yield interesting insights in the original primal constraints. It 
can be shown, that if z = b'u, the following holds 

dz/db. = u. i = l,2,...,m, 

which means that if b^- changes into b^ + Ab., u^- shows the corres- 
ponding change in the target function z; z^. changes with b^. 
This means that if the SP of a constraint is equal to 0, this con- 
straint is redundant. SP's are standard output in most commerically 
available LP packages. It should be noted that the above Interpre- 
tation of iiP Is only valid for certain ranges of b. This range is 
known as the right-hand side range. Changes outside the range and 
changes in several right-hand side coefficients at the same time 
are sutdied in parametric programming. This matter will not be pur- 
sued any further here. 

Logical Conditions in Test Design 

Considering test design as a problem in zero-one programming (as 
first formulated in Theunissen, 1985) supplies a natural opening 
for the Introduction of Boolean variables (Theunlssen, 1986). These 
variables are useful If one wants to put logical conditions on test 
design. Suppose It is stipulated In the test specification, that if 
Item or X2 is selected, then at least one of the Items X3, X4, 
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X5 must also be chosen (Instead of Individual items, the x's may 
also represent strings of items) If we use as notation for 
*if. ..then* and *V* for inclusive *or* (a or b or both) we have two 
proportions, 1x^^X2) and (x3Vx^Vx5), connected as follows: 

(xjVx2) ^ {X3VX^VX5). 

It is obvious how the separate propositions are entered as con- 
straints: 

X]^ + X2 > 1 
and ^3 M ^5 ^ ^* 

What remains to be done Is to connect these two propositions. We 
now introduce a new variable and 'translate' the propositions as 
follows: 



and 



X]^ + X2 - 2d > 1 ^ d = 1 
d = 1 Xo + X. + Xc > 1. 



This gives rise to the following constraints: 



Xj + X2 - 2d < 0 



and 



■"^a - X4 - X5 + d < 0 (with d as binary variable). 
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This models our logical condition. Situations can be Imagined where 
the Item selection proces Is steered in a very detailed way, for 
example, to avoid depedencles among Items. Even for relatively 
small sets of Items this m^y result In rather long and complex 
Boolean expressions. It Is useful to know that reduction algorithms 
for such coniplex Boolean formulations exists. They are used, for 
example. In the algebra of switching circuits (see e.g., Graham 
Flegg, 1965). This matter will not be pursued here any further. 

In the final section of this paper some aspects of the practical 
practical implementation of the above models will be treated. 

Algorithms and Heuristics 

In discrete optimization theory, a useful distinction is the one 
between algorithms that work In polynomial time (P-algorithms) and 
those that work in non-polynomial time (NP-algorithms). Working In 
polynomial time means that the CPU-time necessary for the solution 
of the problem is a polynomial function of the size of the input. 
The Input of^an algorithm Is basically a string of symbols. The 
size is the length of this sequence is the number of symbols in it 
(Papadimitriou and Steiglitz, 1982). For our type of problem. I.e., 
many variables (Items) and relaively few constraints, the size Is 
strongly determined by the number of items. In NP-algorlthms the 
required CPU-time is generally an exponential function of Input- 
size. This means that there is no guarantee that the solution can 
be found in reasonable time, although one does not always know this 
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In advance; for example. It Is known (see Papadimitriou and 
Steiglitz, 1982) that the simplex algorithm and Its derivatives in 
LP are NP-algorithms, but in practice extremely large problems in- 
volving thousands of varaibles and constraints have been solved 
without af\y time-problems. Integer programming, which searches for 
solutions that have only integer values and of which binary pro- 
gramming is a particular Instance, is known to be a NP-problem 
(Papadimitriou and Steiglitz, 1982). Here it <s known, that time 
demands m^^ frequently turn out to be excessive in case of modera- 
tely-sized problems. A well-known algorithm in binary programming 
is the Balas algorithm (for details see e.g., Syslo, Deo and 
Kowalik, 1983). A strategy often taken in practice is that one 
first finds an approximate solution and uses this as a starting 
point for the Balas algorithm to find a purely binary solution. The 
Balas algorithm makes use of Branch and Bound techniques which are 
extensively used in all sorts of heuristics. A orief recapitulation 
of the basic ideas of Branch and Bound techniques will therefore be 
useful. The important point to remember is that, no matter its 
form, a Branch and Bound technique basically is a strategy to check 
the vertices on the hull of a convex bodly (see Introduction). Assu- 
ming a start-solution, obtained by LP, the first st^p is to choose 
a branch variable x^-, for example, the x with the highest fraction- 
al value. The second step is to create two sub-problems, one with 
X,- = 0 and one with x^- = 1, both together with all other variables. 
The value of the target function is now not larger and usually 
lower, since we have the original LP with more constraints. If the 
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solution is now completely integer, one stops; if not, one stops; 
If not, one goes on. Getting lower In the search tree gives steadi- 
ly lower values for the target function. Finding the first purely 
binary solution could be defined as producing the first bound. The 
next step is to go back to a new candidate variable and repeat the 
process again. Any further development at a branch Is stopped when 
the value of the target drops below the current bound before having 
reached an Integer solution. This process continues till the search 
tree Is complete. The choice of candidate variables (branching) and 
the definition of the nature of the bounds, determines the nature 
of the B and B algorithm. Obviously, this type of algorithm can al- 
so be used without prior approximate solution. As noted, however, 
time demands mc^ be excessive. Therefore, now some heuristics 
approach are presented. The effectiveness of a number of therese 
heuristics in a test design was recently investigated by Boomsma 
(1986). 

A well-known theorem In mathematical programming states that If 
we regard a continuous multidimensional KP (which is the same as 
saying we regard a LP with a general upper bound of 1 for the 
variables), then the solution for this KP consists of at most a 
number of fractional Vv^lues, equal to the number of constraints 
with all other values integer, 1 or 0. Since in many test design 
problems thfe number of constraints will be low relative to the num- 
ber of variables (items), this Is a very useful theorem. In the ex- 
perience of the author, simply rounding off the fractional values, 
keeping an eye on the constraints, produces excellent results at a 
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low price. Bcojnsmf (1986) found excellent results with this heuris- 
tic, when he rounded all fractional values upward to 1. This 
guarantees results fulfilling the constraints* Inserting a back- 
tracks mechanism by which it is checked if setting one of these 
rounded variables to 0 will improve the solution without violating 
the constraints was the final embellishment. Since the number of 
constraints Is generally small, the solution found In this wc^y is 
excellent. However, It Is useful to have other heuristics that for 
their effectiveness are not so dependent on the number of con- 
straints. Another heuristic investigated by Boomsma (1986) is the 
so-called Lagranglan heuristic. It is mentioned here, because there 
Is some evidence that It performs well In the case of uniform test 
Information functions (Theunlssen, 1986), and also because It leads 
to his best general purpose heuristic, i.e., the heuristic with 
surrogate constraints. Suppose we have optimization problem 

L(l) maximize v'x 

subject to Ax < b, and x,- = 0 or 1, 

then a theorem by Everett (see Salkin, 1975) says that if L is a 
vector of Lagrange multipliers and xq solves for problem 
L(2) maximize v'x - LAx, 

subject to Xj = 0 or 1 

xqwIII also solve L(l), with b replaced by xq. Algorithms exist, 
that systen.at1cally vary L, until a vector xq Is found that aproxl- 
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mates b as close as possible* The *surrogate constraints* heuristic 
is essenstially a Lagrangian procedure, with as multipliers the op- 
timal values of the dual of the continuous version of the original 
primal. For comparisons as regards the effectiveness, the reader is 
referred to Boomsma (1986)* 



Test Design 
18 

References 

Boomsma, Y. (1986). Item selection by mathematical programming* 

[Bulletinreeks nr. 47]. Arnhem: Cito. 
Luce» R.O., & Raiffa, H. (1957) Games and decisions. New York: 

W11^. 

Papadlmitriou, C.H., 4 Stelglltz. K. (1982). Combinatorial optlml- 
zatlon: Algorithms and complexity . Englewood Cliffs: Prentice- 
Hall. 

Syslo. M.M., Oeo, N., t Kowallk. J.S. (1983) Discrete optimization 
algori thros. Englewood Cliffs: Prentice-Hall. 

Salk1n» H. (1975) Integer programming . Reading: Addi son-Wesley. 

Theunissen, T.J.J.M. (1985). Binary programming and test design. 
Psychometrika> 50, 411-420. 

Theunissen» T.J.J.M. (1986). Some applications of optimization 
algorithms in test design and adaptive testing. Applied psycho- 
logical measurement^ 10 , in press. 



Methods for Simultaneous Test Construction 



Ellen Boekkooi-Tiinminga 



ERIC 



simultaneous Test Construction 
1 



Summary 



An overview of simultaneous test construction methods using 
zero-one programming Is given. The Item selection process is based 
on the concept of Information from Item response theory. Next, some 
objective functions and practical constraints useful in 
simultaneous test construction are presented. Then, the special 
case of constructing parallel tests is considered, ^he paper ends 
with a few examples. 
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Some Methods for Simultaneous Test Construction 

Recently » a start has been made with research on test construction 
from Item banks using mathematical programming. In particular zero-- 
one programming* The Idea to adopt such an approach to test 
construction has been presented In a paper by Theunlssen (1985)* It 
has been further explored In a series of papers by Boekkool- 
Timmlnga {1986, 1987), Boorosma (1986), Theunlssen (1986), 
Theunlssen and Verstralen (1986) and van der Linden and Boekkool- 
Timmlnga (1986, 1987). Some references to operations research 
methods are Rao (1984), Syslo, Deo, and Kowallk (1983), Wagner 
(1972). and Williams (1978). 

In this paper, methods to construct two or n«re tests at the 
same time from an Item bank are presented* Th^ posslolllty of doing 
so Is of great Importance whenever tests with a certain 
relationship to each other have to be constructed, for Instance, 
parallel tests or tests with Increasing difficulty levels 
(Boekkool -Timmlnga, 1987)* 

The actual process of Item selertlon Is based on the concept of 
Information from Item response theory. All Items are assumed to fit 
the same one-dimensional Item response model. Furthermore, maxliMum- 
llkelihood estimation of the subjects's abllltler ^*s assumed, so 
that the Item and test score Information functions have the 
property of addltivlty. Target values for the test Information 
functions are speclflod test constructor at some prechosen 

ability levels* A procedure to obtain target values from test 
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constructors is described by Kelderman (1987). 

Simultaneous Test Construction: The General Case 

Simultaneous test construction can be viewed at as a 
generalization of the test construction method proposed by 
Theunissen (1985). The two models dealt with below clearly 
Illustrate this. The model in (M - (3) specifies the test 
construction model for one test described by Theunissen (1985). The 
mode. liraizes the number of items in the test subject to the 
constraints that the actual test information function values should 
exceed ^(^|^) at all K ability levels considered, where is 
the desired test information function value of test t at ability 
level k. The model in (4) - (6) describes the construction of T 
tests at the same time. The total number of items over all T tests 
is minimized, under the constraint that for each test t and each 
ability level k the actual test inform ion function values should 
exceed the values ^(\)- The following definitions will be 
used: ^{^|^) is the item information function value for item i at 
ability level k. The decision variables indicete if item i is 
selected (x^ = 1) or not (x,- =0). whereas x,-^ indicates whether or 
not item i is selected for tes*, t. The total number of items in the 
item bank is denoted by I. 

The model for the construction of one test is as follows 

I 

(1) minimize I x. 

1=1 ^ 

PC 
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subject to 

I 

(2) ^ ^ " 



(3) X. € {0,1} 1 = 

The model for constructing T tests simultaneously Is 
I T 

(4) minimize I I x.. 

1=1 t=l 

subject to 

I 

(5) > ^ = ^•••••'^ 

Ic *~ ly«««,K 



(6) x^^ € {0,1} 1 = 1,..-.I 

t = I.... J 

Instead of minimizing the number of items many other objective 
functions m^^ be usod (van der Linden & Boekkooi-Timminga, 1987). 
In simultaneous test construction there are several possibilities. 
The objective function can consider aspects of ail, a few or one of 
the tests to be constructed. For Instance, the total number of 
Items In all or in one of the tests. Some objective functions are 
exclusively to be used In simultaneous test construction. This is 
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the case when objective functions consider an aspect taking Into 
account a relationship between all or some of the tests, such as. 
the difference in the actual test Infonnatlon function values at 
all or some of the ability levels considered between all or some of 
the tests to be constructed. In Figure 1 three possible objective 
functions for simultaneous test construction are presented- 



Insert Figure 1 about here 



During the optimization process all kinds of constraints can be 
taken into consideration. An overview of some constraints to be 
used in both simultaneous test construction and the construction of 
one test at a time Is given In van der Linden and Boekkooi-Timminga 
(1987). Theunissen (1987) and in van der Linden (1987). Some 
constraints to be used In simultaneous test construction are listed 
in Figure 2, 



Insert Figure 2 about here 
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Constructing Parallel Tests 

In this section three methods for the simultaneous construction 
of parallel tests are discussed. Tests are considered to be 
parallel if their information functions are the same (Samejima, 
1977). In addition to this statistical definition, it is possible 
to guarantee that tests are also parallel as regards content. To 
achieve this, additional constraints should be added in the test 
construction model. A discussion of these constraints concludes 
this section. 

A possible approach to constructing parallel tests is a 
sequential procedure in which tests are selected after each other 
using a test construction model with the same specifications. 
However, practical experience with this approach shows that such 
tests tend to be far from parallel. Parallel tests can be well 
constructed using .imultaneous test construction methods. Three 
^nethods for simultaneously constructing parallel tests are 
described in Boekkooi-Timminga (1986). The first method assigns 
items to tests. The other two methods match the tests item by item. 

The objective function in the first method is based on a measure 
of difference between the tests to be constructed. For instance, 
objective function 3 in Figure 1 minimizes the maximum absolute 
distance between the actual test information functions. With this 
function, the same target test information function values are 
required for each test (Figure 2, constraint 1), no overlap of 
items between the tests is allowed (Figure 2, constraint 7), and. 
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if necessary, arv equal number of items is assigned to each test 
(Figure 2, constraint 4). 

The second and third methods are based on a measure of 
difference at item level. This measure, c,-j for items i and j, may 
be, e.g., the difference in difficulty level when the Rasch model 
is considered. Using method 2, items with minimum difference are 
assigned to diffarent tests, subject to the condition that the test 
information functions satisfy the target. The third method assumes 
that the Item bank is partitioned into as many as comparable parts 
as tests to be constructed. Then, the procedure of method 2 is 
applied selecting one test from each set in the partition. For the 
construction of two parallel tests, the test construction model for 
the second method is as follows 



(7) m1r 



I I 



Inimize I I c..x.. 
i=l j=l 

subject to 

I I 

(8) I X + I X < 1 j = 

i=l 1=1 



I I 

(9) IV AKi > It(eJ k = 1,...,K 



I I 

(10) I I i.(e,)x.. > ije,) k = i,...,K 

i=i j=i J *^ t K 
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(11) x.j e {0,1} 1 = 1,....I 

J ~ If****!* 

where c^j is large, compared to the other c^j values, whenever 
i = j. The decision variables x^-j are equal to one if items i and j 
are matched. Items i an j should then be included in the first and 
second test, respectively. For both tests the same target test 
information functions are specified in (9) and (10). Constraint (8) 
indicates that an item my be selected for one test only. 

By Including some extra constraints in the test construction 
models, it is possible to assure that the tests are parallel as 
regards content. Indicator variables p^^ are used to indicate if 
item i covers a certain subject matter s (p^5=l) or not (p^-^^O). 
Let S be the number of topics of interest during the selection 
process. Then, (12) gives a set of constraints requiring that the 
proportions a]^,a2»***>a3 of items in the test from topics s must be 
the same for all tests t. 

I I I 

f^2) ^lJ^Pil^•t = %^^Pi2^-t = = ^i^^PiS^t 

t = 1 T 

Examples 

Three examples of parallel tests constructed on basis of their 
test information functions are given (see also Boekkooi-Timminga, 
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1986). Examples 1, 2, and 3 were based on the methods described In 
the previous section. Two parallel tests had to be constructed. The 
test Information function values were considered at the ability 
levels e = 0, 1. The target values were the same for each 
example: > 0*^ at all ability levels considered. An item 

bank of 14 items was used. In Table 1 the item parameters and Item 
information function values are given. Since the ineaning of these 
examples was to explore the behavior of the three methods only on a 
bank of 14 items was used. Applications to more realistic domains 
of items have to wait for solutions to the computational complexity 
of zero-one programming problems. 



Insert Table 1 about here 



The algorithm used for solving the problems was a branch-and-bound 
algorithm developed by Land and Ooig (1960) implemented on a DEC- 
2060 computer. 

In the first example objective function 3 from Figure 1 was 
used. It had two versions: one without (la) and the other with (lb) 
the constraint of both tests containing the same number of items. 
In Examples 2 and 3, the sum of the squared absolute differences In 
Item Information over the three ability levels was considered as a 
measure for the differences between the items. In Example 3 the 
Item bank was divided into two equivalent parts. Part one consisted 
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of the items 2, 5, 6, 7, 9, 11, 14 and part two of the items 1, 3, 
4, 8, 10, 12, 13, The results are summarized in Table 2. For each 
example: (1) the items selected, (2) the number of items selected, 
(3) the test information function values, (4) the maximum distance 
y between the actual and target test information function values, 
and (5) the maximum distance y* between the actual test information 
function values of the two constructed tests are given. For Methods 
2 and 3, the following item pairs were produced: (1*4), (3-10), 
(14-12) and (6-13), (7-4), (14-12). 



Insert Table 2 about here 



It is clear that y* was smallest for Method 1. This result was not 
unexpected because this method explicitly minimizes the distance 
between the items. Instead, the value of y is much larger for this 
method than for the other two methods. Which method should be 
considered best is mainly a matter of taste depending on which 
objective the test constructor finds most important to optimize. 
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Conclusion 

In this paper a description of simultaneous test construction 
methods using zero-one programming was given. First, it was shown 
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that simultaneous test construction methods are a generalization of 
the method for the construction of one test proposed by Theunissen 
(1985). Then, three models for the construction o^ parallel tests 
were presented. In these methods, both statistical and content 
aspects can be taken into consideration. Three examples were given 
using the methods described for the construction of parallel tests. 

Algorithms for solving zero-one programming problems are known, 
and computer packages in which these algorithms are implemented are 
amply available nowadays. However, an important problem with zero- 
one programming problems is their computational complexity (Lenstra 
« Rinnooy-Kan, 1979). If one test at a time has to be constructed, 
CPU-time can be reduced by relaxation, which means that the 
decision variables x^ are allowed to take values between zero and 
one. However, when simultaneous test construction is involved this 
is not possible, because it could lead to solutions in which items 
are partly included in different tests. Before simultaneous test 
construction methods can be used in every day testing practice, 
more research on algoritnms and approximations will be needed. 
Given the large amount of research adressing this topic as well as 
the number Quick approximative methods alreaciiy obtained, it is 
expected that fast algorithms will be found before long. 
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Figure 1« Some Objective Functions for Simultaneous Test 
Construction 



1. Hlnlmizes the total number of Items in all tests: 

I T 
inin I I x.^ 
1-1 t»l 

2. Minimizes the sum of the distances between the target test 
information functions and the actual test information functions 
at the ability levels considered: 

I T K 



mini I I X. I.(e ) 
1=1 t=l k»l ' *■ 



subject to 
I 

^^j"1t*r"k' ' *t 



I x^^I^(e^) > (e^) t - 1 T 



k " 1 K 

3. Minimizes the maximum absolute distance y between the 
information functions of test 1 and 2 at the ability levels 
considered: 
min y 
subject to 

I I 

1 



I I 

-^iNi^^> -y *^^^N2^f\) ^0 k = l K 
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Figure 2. Some Constraints for Simultaneous Test Construction 



1. Target test Information function values: 

I > 

J^x^tl^(e^) , i^(e^) t » 1 T 

2. The number of Items desired for each of the tests t: 

Jj^lt I "t • t = 1 T 

3. Total number of items in all tests: 
I T 



I i X.. I n 
1=1 t=l * 



4. Proportions of items selected for each test given by 
'>lfb2 bj: 

I I I 

^1=1 ^1=1 ^'^ 'l=l ^' 

5. Item 1 must be excluded from all tests: 

"It = 0 t = 1 T 

6. Item 1 must be Included in precisely one of the tests: 

7. Each item must be included in at most one test: 

T 

I X.. < 1 i = 

t=l 

8. Proportions of items selected from each topic given by 
aji a2t«««td3« 

I I I 

«iJjPii^t ' ^^jPiz^t ' - ^S^^jPis'^it ^ = ^ ^ 
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Table 1 

Item Parameters and Information Function Values 



Item 






I^C-l) 


I,-(0) 


I,-(l) 


1 


0.576 


0.695 


0.091 


0.116 


0.118 


2 


-0.442 


1.109 


0.280 


0.290 


0.172 


3 


-0.824 


0.823 


0.168 


0.151 


0.101 


4 


0.254 


0.609 


0.080 


0.092 


0.088 


5 


0.419 


1.213 


0.189 


0.345 


0.326 


6 


-0.017 


1.138 


0.240 


0.324 


0.236 


7 


-0.245 


0.549 


0.072 


0.075 


0.06/ 


8 


1.828 


1.171 


0.047 


0.129 


0.273 


9 


1.109 


0.892 


0.091 


0.157 


0.198 


10 


-0.080 


0.879 


0.165 


0.193 


0.155 


11 


-1.708 


1.384 


0.380 


0.151 


0.043 


12 


0.016 


0.909 


0.168 


0.207 


0.170 


13 


-0.264 


1.299 


0.339 


0.410 


0.229 


14 


0.063 


0.936 


0.173 


0.219 


0.182 
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Table 2 

Results for Test Construction Methods 1-3 





Selected 


n 


i,(e) 




lt(0) 


y 


y* 


la 


9-10-12-14 


4 


0.597 


0.776 


0.705 


0.376 






1.3.4.5.7 


5 


0.600 


0.779 


0.700 


0.379 


0.005 


lb 


3.7.9.13.14 


5 


0.843 


1.012 


0.777 


0.612 






1.2-4-6-10 


5 


0.856 


1.015 


0.769 


0.615 


0.013 


2 


1-3-14 


3 


0.432 


0.486 


0.401 


0.086 






4-10-12 


3 


0.413 


0.492 


0.413 


0.092 


0.019 


3 


6-7-14 


3 


0.485 


0.618 


0.485 


0.218 






13-4-12 


3 


0.587 


0.709 


0.487 


0.309 


0.102 



n: number of selected items 

y: maximum distance between the actual and target test information 
function values 

y*: maximum distance between the test information function values 
of both parallel tests 
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Abstract 

The use of the minlmax principle In iRT-based test construction is 
proposed. It is shown how this results in test Information func- 
tions deviating less systematically from the target function than 
for the usual criterion of minimal test length. :^ext, an 
alternative minlmax approach is presented. Under this approach, the 
test constructor specifies only relative target values which serve 
as constraints subject to which the algorithm maximizes the 
Information in the test. In the final part of the paper, some 
practical constraints are considered (e.g., test composition, 
administration time, mutually exclusive items, and curriculum 
differences), and a description of how these constraints can be 
included in the op\.1m1zat1on model Is presented. 
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Automated Test Construction 
Using (Generalized) Minimax Programming 

Although in IRT-based test construction a target information func- 
tion for the test is specified, the actual Item selection procedure 
usually has a different entity as its objective function. 
Theunlssen (1985), for Instance, has proposed a binary programming 
iDodel for test construction In which the objective function con- 
sists of minimization of the test length. In his model, a branch- 
and-bound algorithm selects a test of minimal length subject to the 
condition that, at a nun^er of ability points chosen in advance, 
the test information function lies above the target function. 

Practical experience In using models with minimization of test 
length as the objective function shows that, for the usual item 
response models, the Information functions usually have a large 
peak In the middle of the ability Interval. The explanation of this 

phenomenon Is simple. Let (k = 1 K) be the values of the 

abilitor parameter considered 1n the model. Since the target values 
for the information function at these points have to be met by a 
minimum number of items, the algorithm will select items with the 
"bulk of their information** In the Interval [Oj^.Oj^]. However, for 
the one- and two-parameter logistic models the Item Information 
functions are symmetric about their difficulty parameter values. 
Hence, a tendency exists to select Items located in the middle of 
the interval. (Due to the presence of a guessing parameter, the 
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Item Information functions corresponding to the three -parameter 
model are skewed to the left and Items somewhat to the left of the 
Interval are preferred*) This tendency will be observed for most 
target functions In use In test construction* An exception Is the 
case of a U-shaped target with large values at the extremes; then, 
obviously, the test will tend to contain Items not In the middle 
but at the ends of the Interval. 

The above phenomenon Is not only less elegant but may also nave 
some practical consequences. For example, the fact that ail Items 
tend to concentrate at a single point and not to be distributed 
over the entire Interval ma^y be less desirable as regards test 
content. Also, in case new tests for the same Interval have to be 
selected on a regular basis, the supply of Items In the ralda^e o' 
the Interval may quickly be exhausted. Then the procedure no longer 
meets the ideal of producing short tests. 

This paper is based on a twofold goal. The first goal is to 
propose an objective function of the minlmax typ^i to solve the 
above problem. Although other remedies are possible, this objective 
function has two other favorable properties: First, as will be 
shown below, the minlmax principle has a generalization that 
suggests a simple experiment to elicit target information functions 
from test constructors. It is believed that this experimental 
approach provides a major advance in the attempt tc solve the 
awkward problem of specifying a target Information function. 
Second, an objective function of this type does not contain any 
test parameters. Therefore, the properties of the test may be 
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completely controlled by manipulating appropriate constraints in 
the model. It is the second goal of the paper to exemplify the use 
of this model under a variety of practical constraints. 



The purpose of having a target information function for a test is 

that at each of the points (k = 1 K) the information about 

the ability parameter will be close to some prespecified value. Let 
I^{9|^) and denote the actual test information at e^^ and its 

target value, respectively. As the test information function may 
approach the target values from below as well as from above, a 
choice needs to be made. It is henceforth assumed that the target 
function specifies the minimum amount of information required from 
the test and that I^(6|^) nwst approximate K^^) from above, it 
follows that the relevant quantities are the (non-negative) values 
{I^{e|^)-I(8|^); k=l,...,K} and that the objective function in the 
item selection model must guarantee that they are minimal in some 



A direct attack on the problem of peaks in test information 
functions is to minimize the largest deviation from the target 
function subject to the condition that all deviations are non- 
negative. This leads to the following criterion: 



A Minimax Test Construction Model 



sense. 



(1) 



minimize [max Ut'^'^^'^k^* ^"^ 



, • • • , 
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Although the minimax criterion specified in (1) seems to result in 
non-linear optimization, it is a standard transformation in 
mathematical programming to modify (1) so that it is in a linear 
form (e.g., Wagner, 1975, sect. 14.8). Let y denote an arbitrary 
upper bound to the set (^^(^^"^81^); k=l,...,K} and let be 
the value of the information function of item i (i = 1,. .. ,1) at 
the point B^. Now, if x^- is the decision variable as to whether 
(Xj«l) or not (x^«0) to include item i on the test, a linear 
progranming model minimizing the largest deviation me^y be specified 
as follows: 

(2) minimize y 

subject to 



I 

(3) I ^i(\>N < ^'^1.) 



i=l 



ic ~ 1, ic 



(4) 



.1 Wi ->^'\' 



ic ~ 1, IC 



(5) X,. € {0, 1} i = 1 I. 

I 

The constraint in (3) requires the deviation of lAQ^) = 7 1.(9,,) 

t K i k 

from to be no larger than the upper bound y; the constraint 
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in (4) stipulates that these deviations are non-negative. By mini- 
mizing the upper bound y in (2) the test infornjation function tends 
to conform to the target function. Consequently, a test information 
function with the smallest possible peak is produced and the items 
tn the test are spread out over the interval [e^.e^^j. The model 
specified (2) through (5) can be solved for (y, x^, xj) by one 
of the branch-and-bound algorithms available for integer 
programming problems (Wagner, 1975, chap. 13). 

It should be noted that the objective function specified in (2) 
is just a dumny variable introduced to cast the minimax criterion 
into a linear model. Hence, it does not contain any item or test 
parameters. This provides the test constructor with the potential 
for controlling any feature of the test t^at can be modeled as a 
linear constraint* Examples of such modelling will be provided 
below. 

An Alternative Minimax Model 

In IRT-based test construction it is assumed that the test 
constructor Is able to specify a target information function. 
Although in general the target function of a test should be derived 
on the basis of its intended use, the specification of such a 
function is by no means an easy task. This section of the paper 
describes a simple experiment that maiy be used to elicit 
information about target functions from test constructors. An 
alternative minimax model is then presented in which elicited 
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information is used in item selection. An other approach to the 
problem of specifying target information functions is given by 
Kelderman (1967), 

The suggested experimental approach consists of the following 
steps. First, the test constructor is faced with the ability scale 
underlying the item bank. This can be done by offering him or her a 
line displacing the contents of items with locations at some well- 
chosen points. The same practice is used in scale-score reporting 
of assessment data (e.g., Pandey, 1986). Then, the constructor is 
asked to select a number of scale points he or she wants to 
consider. The number of points and their spacing are free. Next, he 
or she is given, say, 100 chips and requested to distribute them 
over the scale points such that they reflect the relative 
distribution of information wanted from the test. The final step 
then is to ask the test constructor for the desired length of the 
test. The answer to this question can be facilitated by providing 
some statistics about the time typically needed by the group of 
examinees to complete items in the bank. 

Let be the numbers of chips the test constructor puts at 
point (k = 1,...,K). NOW the idea is to characterize the 
relative target information function by a series of lower bounds 

(rjy rj^y) in which y is a dumny variable to be maximized 

subject to the constraint that test length is equal to the value n 
specified by the test constructor. This leads to the following 
model : 
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(6) maximize y 
subject to 



(7) 



(8) X X, = n 

i«l ^ 



(9) X- € {0, 1} 1 = 1, I. 

The constraints in (7) set a series of lower bounds, r^y, to the 
test information = I ^j(\)x. at each of the points e^^. 

The common factory in these bounds is maximized in (7). The 
constraint in (8) sets the test length equal to n. 

Just as in the previous model, the present model also tends to 
prevent the items in the test from concentrating in the middle of 
the ability interval. The reason is simply that for each test with 
an information function showing a large deviation from the target 
function at one of the points e^^, it is likely that a test with a 
series of uniformly larger lower bounds r|^y could be found by 
distributing the items more in accordance with the relative weights 
(rj^, •••» ^^k^* 

A comparison between (2) through (5) and (6) through (9) shows 
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that the latter has K-1 less constraints* Nevertheless. It has the 
additional potential for controlling the length of the test* 

Some Practical Constraints 

For automated test construction to be practical, it is necessary to 
provide control of features of the test other ^han just the 
information function and the number of items* Since the previously 
presented itx}de1s are linear programming models, they can easily be 
extended through the use of additional constraints, provided these 
can be modeled as linear (fn)equalities. In this section some 
practical constraints are discussed. Tnroughout the discussion it 
Is assumed that (6) through (9) is the basic model* 

Test Composition 

As alreacly noted, for a sufficiently large bank of items, the 
constraint in (8) controls the length of the test. The same 
principle can be applied at the level of possible subtests 
providing the test constructor with the ability to control the 
composition of the test. Let (j = 1,...,J) be a subset of items 
in the bank from which the test constructor wants n^ < n in the 
test. This is attained if the following equality is added to the 
model : 



(10) I X. = Pj 
1*1 ' J 
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It is important to note that for a series of such constraints the 

sets (j e 1 J) do not need to be disjoint. This provides the 

opportunity for controlling the composition of the test 
simultaneously with respect to several dimensions. For example, an 
Item bank for English could be parti toned not only with respect to 
Its content (e.g., vocabulary, grammar, or reading comprehension), 
but also to a behavioral dimension (e.g., knowledge of facts, 
application of rules, or evaluation) or the format of its Items 
(e.g., multiple choice, completion, or matching). For each set In 
these partitions the constraint in (10) is incorporated within the 
model, with the restriction that the nj's are specified such that 
the sum over all sets the same partition is equal to n. If this 
option is used, the constraint in (8) is redundant and may be 
dropped. 

Administration Time 

In a computerized testing environment, the time needed to solve the 
Items in the bank by the population of examinees of interest can 
easily be monitored. Let t^ be, e.g., the 95th percentile of the 
distribution of time for item i in the population. Instead of 
fixing the length of the test, the selection of the Items could 
also be based on the time limit, T, in force for the examinees. In 
that case (8) is replaced by 

I 

(11) I t.x. < T. 
i=l ^ ^ " 
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Analogous to (10), the composition of the test can be controlled by 
introducing tiiTie limits at the subtest level. 



Selection on Item Parameters 

Let c^- be a positively valued numerical parameter representing a 
feature of the items in the bank. Then it is possible to restrict 
the selection of the items to those with c. e [Cj^, c^] by including 
the following set of inequalities in the model: 



(12) c.x. < c„ 



(13) c:\. < c-^ 



i = i, I 



i— 1, ...f I f 



where c^ > c^. 

* Unlike (10), these constraints do not fix the length of 
subtests. Therefore they can be used to give all items in the test 
the same properties. At the same time, (10) can be used to compose 
the test with different item properties. 

If the frequency of administration of the items in the bank is 
monitored, the constraints in (12) through (13) can be used to 
restrict th.: selection of the items to certain frequencies. For 
example, if the intention is to obtain uniform usage of items in 
the bank, (12) can be used to set an upper bound for item use thus 
restricting the selection of items to those with lower usage. 

It Is also possible to substitute one of the parameters in the 
item response model for c.-. In this way, the constraints can be 
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used, for example, to select items with values for the difficulty 
parameters in a certain interval. For the Rasch model, this allows 
for the selection of items based on their probabilities of success: 
Let be the a priori known average ability of the group of 
examinees, and let [Pj^»Py] be the interval to which the 
probabilities of success for the ''average'* examinees are 
restricted. It follows that the items must have the values of the 
difficulty parameter, b^-, in the interval [bj^.b^^] determined by 
p(0^;t>j^) = Py and p{©Q;bjj) = p^^, where p{.) is the logistic 
function specified in the Rasch model. Selecting items based on 
their probabilities of success for given examinees may be desirable 
for instructional reasons. 

Group-dependent Item Parameters 

If the item bank has to serve distinct groups of examinees, items 
may have different properties for different groups. In such cases 
it is obvious to consider the parameter c^- in (12) - (13) as group 
dependent. In school settings, for instance, the recording of the 
date of the ^inal administration of item i to group g = 1, 6 
may be useful. Viio constraint in (13), with c^^ instead of Cg, then 
allows the selection of items for one group that have not been used 
after a given date for other groups. Such strategies ma^y be 
instrumental in solving the problem of test security. 

If Cgj is allowed to take only the values zero and one, it can 
be used to adapt tests to curriculum differences between groups. 
Let Cg^ indicate whether (Cg^-=1) or not (Cg^-=0) item i covers a 
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part of the curriculum of group g. Then the following constraint 
automatically suppresses the administration of items to group g on 
topics for which instruction is absent: 

(1^) X. < Cg. i = 1, I. 

Inclusion or Exclusion of Individual Items 

For some personal .-ason the test constructor (My want to include 
or exclude certain items from the test. As alred<ty noted by 
Theunissen (1985) and Boekkooi-Timminga (1986), the following 
constraints can be used for this purpose: 

(15) X. = 1 

(16) X. = 0 

with V. n V. = (J. 
'^o 

Inter-ltem Dependencies 

In some practical situations certain Uems are not allowed to be 
administered on the same test. For instance, this me^y be the case 
if some items contain a cue with respect to the solution of other 
items. Suppose i^ = 1, .... indicates a set of mutually 
exclusive items in the bank. Then, the following multiple -choice 
constraint allows the selection of at most one item from this set: 



i € V, 



1 € 



V 



Automated Test Construction 
14 

(17) I X. < 1 
i„=l ^0- 

The opposite case occurs If the selection of one item entails the 
necessity to select other items as well* This may occur if the 
content of some items builds on that of other items. (The question 
if one of the current response models could fit such items is 
deliberately omitted.) It is also possible to model the presence of 
this dependency between test items as a linear constraint. 
Let ii = 1, .... Il represent a set of dependent items in the bank. 
The following equality guarantees the simultaneous inclusion or 
exclusion of these items from the test: 

(18) x.^ = x^^^^j il = 1, .... h-l. 

The last two constraints differ from those previously specified 
In that they represent dependencies among items in the bank that 
hold for all possibly generated tests. Therefore, they should be 
specified when the item bank is designed and automatically inserted 
in the model each time a test is constructed* Another approach to 
the problem of inter-item dependencies, using Boolean algebra, is 
given by Theunissen (1986). 

Discussion 

From (6) through (8), it is clear that the basic model in this 
paper always has a feasible solution for n < I: The constraint 
specified in (8) stipulates that n items are selected; from all 
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possible selections of this length, (6) through (7] result in the 
choice of the one with an infonnation function for which the lower 

bounds {r^y r^y) are maximized* Thus, a sufficient condition 

for a model extended with additional constraints to have a feasible 
solution is that the intersection of their solution spaces is non- 
empty* This should be taken into account when specifying 
constraints in (10) through (18) as an addition to the basi'' model* 
For example. If (10) is used in combination with (18), it should be 
specified such that the lengths of the subtests are consistent with 
the equality constraints in (18)* 

Strictly speaking, a solution to the models in the paper is just 
a collection of test Items* To make them into a test, the items 
should be put into an appropriate order of administration* This 
again can be considered as a problem of optimization subject to 
constraints with respect to, e*g*, item difficulty, administration 
time, or topic structure* How this problem can be solved using a 
linear programming model is the subject of another paper* 

As a final comment it Is noted that in a computerized test 
system the models in this paper can also be used In an interactive 
mode* In doing so» the system selects a test and requests the user 
to Indicate which items are appropriate and which are not* In the 
next stage, the model is used to select a new version of the test, 
but now with x^- = 1 for the items that have to be retained and x,- = 
0 for those that were rejected* The process is repeated until all 
items are considered appropriate* Interactive use of the models in 
this paper is recommended since It allows test construction to be 
based on possible remaining constraints of interest that can not be 
modeled as linear (in)equallties* 
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Abstract 

To construct a test from an item bank, items are selected from the 
bank so that the test has a certain test information function. In 
this paper procedures to assess target information functions for 
the test are described. The probability that a certain student of 
ability 0j will erroneously obtain a higher estimate than a more 
able student with ability can be derived from the information 
function of the particular test. The procedures to obtain informa- 
tion function are based on the reverse relation; from the probabi- 
lities of wrong-order mistakes (WOM), information-function values 
are obtained. In a dialogue between the test constructor and a 
computer, the procedure can be used to obtain the information 
functions on one or more scales. 
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Some Procedures to Assess Target Information Functions 

Item banks are used increasingly in educational testing {Choppin, 
1976. 1981; Wright « Sell. 198^). An item bank contains a large 
number of test items relevant for a particular curriculum. From an 
item bank a teacher m^ select a set of items to measure the abili- 
ty of a group of students. In this selection proces two problems 
are encountered. 

The first problem is that, in general, an item bank will not 
consist of a single homogeneous set of items fitting a one dimen- 
sional item response theory model. Typically, a bank will contain 
different homogeneous scales measuring the effects of particular 
elements of the curriculum. Bock, Mislevy and Woodson (1982) call 
these elements indivisible curricular elements. In selecting the 
items to be used in a test, the teacher has to decide which indi- 
visible curricular elements have to be represented in the test and 
with what weight this has to be done. This is a problem of content 
validity (Thorndike 1982, ch. 7). 

Secondly, for each indivisible curricular element It has to be 
decided how many items of what levels of difficulty roust be includ- 
ed in the test. For example, if for a particular element only a low 
level of ability is required, easy items have to be included in the 
test. 

Birnbaum (1968) has pointed out that information functions can 
be used for test construction. If, for a given latent trait, a tar- 
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get information function is known, the desired test can be con- 
structed by selecting items in siich a viay that the information 
function for the test approximates the target function. Lord 
(1980), Theunissen (1985) and Boekkooi-Timminga (1986) describe 
methods to do this. Theunissen (1985) describes a procedure based 
on integer programming, a special branch of linear programming. 
Boekkooi-Timminga (1986) uses integer programming to construct 
several tests simultaneously starting with several information 
functions for different tests. 

To employ these methods, for each of the scales a target infor- 
mation function must be known. To date, however, no satisfactory 
method to specify target information functions is available. 

In this paper, first the concept of an information function and 
its properties are described. Some interpretations of this concept 
are discussed and an interpretable function of information is cho- 
sen: the probability of a wrong order of the ability estimates of 
two subjects. 

Using this interpretation, a paired comparison experiment is 
proposed that yields the values of the information function for 
different scale points by comparing their wrong order probabili- 
ties. These experiments can be used in an interactive procedure to 
specify information functions for scales. 

Test Information Functions 

Consider a test measuring q traits, where each trait r (=1 q) 
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is measured by n^, items with responses U = {\}^^ U ). Each 

r 

item response u,,^- can take values 0 (wrong, negative) or 1 (right, 
positive)* It is assumed that the responses are locally independent 

P(Ur="r I ^'''^^ " ^^^i""ri h'''^^' 

that is, the dependence between the item responses is wholly ex- 
plained by their dependence on an uni dimensional latent trait 
(r) 

0 . Several item response models are proposed for 
I Ir) 

P(y^,-=u^^. I e' '). For example Rasch (1980) gives the model 

exp(u^(e-6^ )) 

P(U.«u. (e)- , (2) 

l+exp(0-6^) 

where 6. is a parameter for the difficulty of item i and the index 
r denoting the trait is dropped. 

The amount of information about 0 contained in the subtest U 

r 

is defined as (Kendall A Stuart, 1978, p. 10): 

^^^^ ==^0 [(-|e ^°9P'"rh)^J ^'3) 
If the items are locally independent (1), we have (Lord, 1980): 



1(e) « I 1.(0), (4) 
i=l ^ 

where 
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is the Information about 9 In the response to Item 1. For example. 
In the Rasch model the item Information function is 

1^(9) = [2 + exp{d^«.9) + exp(9-d.)]"^ (6) 

If t Is an unbiased estimator of some function t{9) of 9, the 
Cramer-Rao Inequality (Kendall & Stuart, 1978, p. 10; Lord, 1980, 
p. 71) states that 

Yar(tl9) > {x*{B)]^/UB) . (7) 

If 9 Is a maximum likelihood estimator of 9 we have asymp- 
totically (Lord. 1980, p. 71): 

Yar{9|9) = I'^lB) (8) 

Furthermore, 9 has asymptotically a normal distribution with 
expectation 9 and variance r^{9) (Oosterloo, 1984). 

Interpretations of Test Information 

To obtain a target Information function 1(9) from a test construc- 
tor, some suitable Interpretation of both the latent trait value 
9 and Its associated Information value 1(9), must be available. 
Unfortunately, neither 9 nor 1(9) have a straightforward interpre- 
tation. Before a procedure for obtaining Information functions can 
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be constructed, we must therefore relate e and 1(e) to quantities 
that do have an interpretation familiar to the test constructor. 

Interpretation of Ability Level 

Firstly, the ability level can be expressed In terms of the 
expected observed score (true score) of some subtest with which the 
test constructor Is familiar. Let A be the set of items of this 
test, then 

^i(e) = I P(U.=l|0) (9) 
^ ieA ^ ' 

is the true score, where P(U^al'e) \s some IRT model. In general, 
however. It not be expected that test constructors are very 
familiar with specific tests, let alone homogeneous subtests 
pertaining to indivisible curricular elements. 

A second way to give an interpretation to the ability level Is 
to relate 6 to percentiles for a reference population with which 
the test constructor *s familiar, i.e. 

tgO) = lOO.F(e) , (10) 

where ^2^^) the percentile point for ability level 9 and F Is 
the cumulative density function of 8 In the population of interest, 
e.g. students of a certain grade level in a certain school type. 

In general, it m^y be expected that the test constructors are 
more familiar with subpopulatlons than which specific tests. 
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Consequently percentile points maty be the prefered way to express 
ability level. In the application of percentile points, however, a 
good description of the trait to be measured should be given. This 
trait can be described verbally in terms of teaching materials or 
In terms of test items, 

A third way is to give an interpretable representation of dif- 
ferent ability levels is to provide tesv. items with difficulty 
levels corresponding to the particular 9 level. Instead of test 
items also mean ability levels of particular reference groups can 
be used. 

Interpretation of Information 

A more difficult problem is the one of interpreting test infor- 
mation. For functions x of 9 we can use the relation 



which can be Interpreted as "the slope of the regression of t {1,e, 
the observed value of t{9), on 9 relative to the stan-Iard error of 
measurement of t for fixed 9" (Lord, 1980, p, 67), This interpreta- 
tion can be useo for both true scores (9) and percentiles (iO), but 
it has the drawback of still referring to a 9 scale which is not 
interpretable. Moreover the 'standard error of measurement* Is not 
directly a very easily interpretable quantity either, 

A second way to get to an V rpretable quantity from which in- 
formation values can be obtained is to use the property that for a 
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latent trait value 9 the estimator 0 has an asymptotic normal dis- 
tribution with variance r^(e)« The asymptotic confidence interval 
has length I'^'^e) so that the .95 confidence interval is 
e ± 1.96 I'^'^e). The interpretation of a confidence interval "The 
probability that the interval covers the true value is .95" is not 
easy to comprehend for test constructors who are unfamiliar with 
statistics. 

A third way to derive an interpretable quantity from test in- 
formation is as follows. Consider two individuals with true ability 
levels and e^, respectively, where the second individual is more 
able than the first. If the values of the target information func- 
tion for and ^2 small, the variances of the estimated abili- 
ty levels and ^2 large. In that case, the probability that 
the first individual is erroneously estimated to be more able than 
the second individual becomes also large. 

The probab-flity of such a wrong order mistake (WOM) can be 
derived as follows. Because Gj and 02 are ability scores obtained 
from different persons who respond independently to the test 
(Kreyszig 1970, p. 173) 

Varfe^-e^) = Varte^) + VarfSg) (12) 

Furthermore, from the fact that 0^ and 02 have an asymptotic normal 
distribution with mean 8^ and ©2 and variance l"^(e^) and 1^^(82), 
respectively, we have 
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Pte^-e^) = N(0^.02.r^(0^) + I'^iQ^)) , (13) 

so that the probability of putting boch persons in the wrong order 
of ability Is: 

P(e^^02) = P{0^-02 > 0) = 

I ^(e^) + J ^(e^) i^^(e^) + 1-^62) 

-(e, -Gp) 

= -1 ^ ^1 ) = 

I ^{e^)+l ^62) 

= * ( ^T— ^ ) . (14) 

I Me^) + I ^(62) 

where * is the cumulative normal distribution function. From (14), 



r^(e^) + I'^iQ^) = (6^-62) {*"^(P(e^>e2)r^ (15) 



so that the sum of the reciprocals of the information values for 
two known scale points can be obtained if the wrong-order probabi- 
lities are known. Furthermore, from (15) we have: 

1(62) = 1/{{Q^-B^) [*"^(P{Ye2))]"^ - I"^(e^)} (16) 

If one of the information values is known, the wrong-order probabi- 
lity can be used to determine the other. If both information values 
are unknown but can be assumed to be approximately the same we have 
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igte) = 2/{{0^-02)[$"Hp(e^>e2))rM (17) 

Since rieasurement in education is concerned with comparisons 
between per^^ns on particular traits and tiie fairness of these 
comparison is important to roost test constructors, we will choose 
for an interpretation of information functions in terms of wrong- 
order mistakes. 

Assessment of Info^-mation through Wrong-order Mistakes 

The above relationship can be used to give a sensible interpre- 
tation to an information function for the precision of measurement 
in terms of wrong-order probabilities. It can also be used the 
other wa^y around. That is, the desired measurement precision m^^ be 
specified in terms of wrong-order probabilities to obtain the 
associated target information function. 

To obtain a target information function for a certain scale, a 
number of equidistant points may be chosen for which the informa- 
tion values will be determined. Three to five of such target infor- 
mation values suffice for the construction of a test using integer 
programming (Theunissen, 1985). For each pair consecutive pair of 
scale points, a test constructor, typically a teacher, is then 
asked to specify the wrong order probability that (s)he is willing 
to accept for that particular pair. The target information function 
is then calculated using formula (16). Since formula (16) supposes 
that one information value is already known, one more pair of scale 
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points must be judged to remove the indeterminacy. 

Although the idea of wrong order is easy to grasp, the proce- 
dure Is not entirely satisfactory. Firstly, producing a probability 
is still a difficult task. The teacher wight not be willing to 
accept any order mistakes at all! But on the other hand he might 
not be willing to pay the price of a very long test, or not believe 
that that is the price to pay. 

Secondly, this procedure is restricted to one scale only. We 
need however a procedure that simultaneously yields Information 
functions for a number of homogeneous subscales. The procedure must 
give reliable Information concerning the relative heights of these 
information functions so that a composite test can be constructued. 
In this section of the paper, some procedures to be presented are 
based on paired comparlsions of wrong-order ral stakes that avoic* 
asking for probabilities and can be used to obtain information 
about the re^ yr *ieights of information functions for different 
traits. Th> zdn be done through a procedure based on the compari- 
son of two paires of items: The pair completion experiment. 

The pair-completion experiment is as follows. The test con- 
structor is confronted with three appropriately anchored scale 
points. (S)he must now provide a fourth scale point so that the 
wrong-order mistake In scale points one and two is equally serious 
as a wrong-order mistake in scale points three and four. We will 
show now that if the Information values of the first three scale 
points are known, the information value of the fourth scale point 
can be calculated. 
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The basic idea is that two wrong order mistakes of the same 
seriousness are allowed to occur with the same probability* that is 

P(ejj>e^^) = P(e2i>e22). (18) 

where B^^ and estimated latent trait values of the 

first pair and and 622 the estimated latent trait values 
of the second pair. 

From (18) and (14) a^d the fact that the cumulative normal dis- 
tribution function has an inverse, we have 

^ 12^(822) = lilhB,,)Hi\2)). (19) 



'11 ^12 



so that 



h^hz^ - '^'^^ iilhe,,)^ilhe,,)).ilhB,,)]. (20) 

wliere and I2 are the information func-ions for the first and the 
second pair, respectively. These Information functions are indexed 
because the first and the second pair of scale values m6iy be from 
different latent traits. 

If the information values of the given scale points 9^^^ 
and 622^ are known, the Information value of the scale point 822 
that is provided by the test constructor (through his choice of an 
Item or a percentile point) can be calculated using (20). If the 
information values In the second pair are assumed to be 
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approximately t^e same, we have from (191: 



®ir®12 



1 21 2 22 821-822 I^(8ii)+I-^(8i2) 

If the Information values are assumed to be approximately the same 
in both iwembers of the first pair i,e, I^C^^^) = ^1(^12^* ^^^^ 

W*l/f2!?L!^ 1-1(8^1) -1-1(821)] . (22) 

and if the information values within both pairs are approximately 
the same, i.e. I^le^^) = ^1(^12' ^2^®2l' ^ l2'®22*' ^^^^ 

A Procedure to Determine Multiple Information Functions 

The pair-completion experiment can be used repeatedly to obtain 
information values for a number of points on one or more scales. It 
is supposed that the test constructor is seated in front of a 
terminal. The terminal presents the scale points appropriately 
anchored by item content and/or percentile points on a line. The 
following steps are now taken to determine the information 
functions. 

Step 1 The test constructor is asked to select an interval on one 
of the scales around the point (s)he is most interested 
in. The ivems corresponding to the end points of the 
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Interval are highlighted. 

Step 2 The test constructor Is requested to give the WOM probabi- 
lity for the end points of the interval. The end points 
are assumed to have the same Information values. 

step 3 A new point on the scale scale Is found using the pair 
completion experiment. In this experiment, the points of 
the first pair are the end points of the Interval and the 
first point of the second pair Is one of the end points of 
the interval. The information value of the new point is 
calculated by formula (22). 

Step 4 In the same way as in step three, information values are 
determined for points at the right (left) side of the 
scale, using the two rightmost (leftmost) points as points 
with known information values. The information value of 
the new point is calculated by formula (20), At each step, 
the test constructor my decide to stop adding points. 

Step 5 The test constructor is asked to select another scale and 
a point on that scale {s)he is most interested in. 

Step 6 The information value of this point and another point on 
the new scale is found by executing the pair completion 
experiment. In this experiment, the points of the first 
pair are the end points of the Interval on the old scale 
and the first point of the second pair is the the point 
chosen on the new scale. Assuming that the information 
values of both points on the new scale are the same, the 
Information value of the new points are calculated by 
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formula (?3). 
Step 7 Step 4 is repeated for the new scale, 
step 8 Step 5 through 7 is repeated another new scale until all 

relevant scales are processed. 

In this fashion information values are obtained for a number of 
points on the on a number of scales. Note that the test constructor 
can determine how many points on the scale (s)he wants to have. In 
many applications this number will be small, so that the procedure 
need not take too much time. 

Discussion 

The procedure presented in this paper yields a number of infor- 
mation functions to be used for constructing a test measuring a 
number of homogeneous traits. 

The procedure to determine information functions is carried out 
only once. No replications are made to check the reliability of the 
judgments. Checks could be built into the procedure but would re- 
quire a lot of time from the test constructor and may make the 
ipethod impractical. 

The unit of precision by which the functions are measured is 
chosen by the test constructor in step 5. This might be hard to be 
do no. The size of this unit also has a large effect on the total 
duration of the procedure. Therefore the unit might be chosen by 
the system to limit the duration of the procedure. 
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In this paper a pair completion experiment is performed. In 
some cases this will not be feasible because there might not be a 
scale point available for which the WOM in the second pair is as 
serious as a WOM in the first pair* For example, if it is 
considered much more important to measure one trait than another 
trait, a WOM for the standard pair in the first scale may be more 
serious than a WOM for any pair of scale points in the second 
trait. As a result the test constructor is not able to Indicate a 
new point on the second scale. In that case another experiment me^y 
be devised where all four points are furnished by the system. The 
test constructor is then asked to distribute a number of dollars 
{say 100) over the two pairs in proportion to the seriousness of a 
WOM In each of the pairs. The relative probability of a WOM Is now 
taken Inversely proportional to the relative seriousness of a WOM, 
so that the expected value of WOM seriousness Is equal In both 
pairs. For example, if a WOM in the second pair is considered twice 
as serious, the probability of a WOM in the second pair becomes 
half the probabilit> of a WOM in the first pair. Formulas for 
determining the Information values of the points similar to (20) 
through (22) can be derived jlong the lines of the present paper. 
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